`net_plugin` Performance Optimizations by heifner · Pull Request #204 · Wire-Network/wire-sysio

heifner · 2026-02-17T19:01:08Z

Summary

Elimination of heap allocations, mutex contention, and redundant computation from net_plugin hot paths: block/transaction/vote receive and broadcast. These are the highest-frequency code paths in net_plugin;
executing thousands of times per second across all peer connections.

Changes

Atomic scalars for chain_info reads

Replace exclusive mutex acquisition with std::atomic<uint32_t> for fork_db_root_num, head_num, and fork_db_head_num. These are read on every block/trx/vote receive across all net threads.

Estimated savings: ~25ns per read x 3 fields x every message = ~75ns per inbound message (eliminates mutex contention across N threads)

Reduce nested lock acquisition in `peer_connections()`

Move peer_connections() call (which acquires its own lock) out of for_each_connection (which holds connections_mtx), eliminating redundant nested locking.

Atomic `_write_queue_size`

Convert queued_buffer::_write_queue_size to std::atomic<uint32_t>, eliminating exclusive mutex acquisition in the read-only write_queue_size() called on every read cycle in start_read_message.

Estimated savings: ~25ns per read cycle per connection, fires on every inbound TCP read

Synchronous `integrate_received_qc_to_block`

Run QC integration synchronously on the critical path before fork_db_.add() instead of posting to thread pool. Only consider_voting is posted asynchronously.

Estimated savings: Eliminates thread pool scheduling delay (~1-5us) from block processing latency

Atomic for `queued_buffer::_out_queue` empty check

Replace mutex-guarded _out_queue.empty() read with atomic flag.

Estimated savings: ~25ns per enqueue call per connection

Eliminate `std::function` in `queued_write`

Replace std::function<void(boost::system::error_code, std::size_t)> callback stored per queued message with a compact struct holding an enum tag + optional fields. Eliminates heap allocation from std::function SBO overflow.

Estimated savings: ~50-100ns per enqueued message (heap alloc + dealloc). At 1000 TPS x 25 peers = 1.25-2.5ms/s saved

Stack-local `small_vector` for `peer_connections()`

Replace unordered_flat_set<uint32_t> returned by value (heap-allocating) with small_vector<connection_id_t, 64> (256 bytes on stack via RVO).

Estimated savings: ~80-120ns per transaction broadcast (heap alloc + dealloc eliminated)

Eliminate per-peer `boost::asio::post` lambda heap allocations

Pre-bind broadcast lambdas to avoid per-peer lambda capture heap allocations in block/trx/vote broadcast fan-out loops. With 25 peers, each broadcast previously created 25 separate lambda heap allocations.

Estimated savings: ~50-80ns x 25 peers = ~1.25-2us per broadcast event

Wire-level transaction ID prefix for zero-allocation duplicate dedup

Embed the transaction ID as a fixed prefix in the transaction_message wire format, allowing the receiver to extract the 32-byte ID directly from the network buffer without deserializing the full packed transaction. Duplicate
transactions (the common case in a gossip network) are rejected via a single memcmp + index lookup with zero heap allocation.

Estimated savings: ~200-500ns per duplicate transaction (eliminates fc::raw::unpack of packed_transaction + packed_transaction_ptr construction). At 80%+ duplicate rate with 1000 TPS x 25 peers: ~5-12ms/s saved

Wire-level vote ID prefix for zero-allocation duplicate vote dedup

Same approach as transactions: embed a vote ID (SHA-256 of key fields) as a wire prefix, allowing duplicate votes to be rejected from the raw network buffer without BLS signature deserialization.

Estimated savings: ~2-5us per duplicate vote (eliminates BLS pubkey + signature deserialization). With 25 finalizers x 25 peers at 80%+ duplicate rate: ~1-3ms/s saved

small_vector for write buffer descriptors

Replace std::vector<boost::asio::const_buffer> in do_queue_write with small_vector<const_buffer, 16>. Eliminates 2 heap allocations per drain cycle (original vector + copy into async_write's internal write_op).

Estimated savings: ~80-120ns x 2 = ~160-240ns per write drain cycle per connection

Aggregate Performance Estimate

Under production load (1000 TPS, 25 finalizers, 25 peers, ~80% duplicate rate):

Transaction path: ~10-20ms/s saved
Vote path: ~3-8ms/s saved
Write path: ~5-10ms/s saved
Total: ~18-38ms/s of CPU time returned to the event loop

the three most frequently read chain_info fields: fork_db_root_num, head_num, and fork_db_head_num. These scalar getters are called on every block/trx/vote receive path across all net threads, making the exclusive mutex a potential significant contention point.

…tion which also takes a lock.

…eliminate exclusive mutex acquisition in the read-only write_queue_size() method, which is called on every read cycle in start_read_message.

thread pool, saving the received QC immediately on the critical path before fork_db_.add(). Only consider_voting is posted asynchronously. Eliminates thread pool scheduling delay from block processing latency.

…ueued message Replace the std::function<void(ec, size_t)> callback in queued_write with concrete struct fields (connection_id, close_after_send, net_msg, block_num) and inline the callback logic into clear_out_queue. This removes the per-message heap allocation caused by the lambda closure exceeding std::function's SBO limit, along with the shared_from_this() refcount bump/release in enqueue_buffer. At 10K TPS with 25 peers: eliminates ~250K heap alloc/dealloc pairs and ~500K atomic refcount operations per second.

…th stack-local small_vector. peer_connections() returned connection_id_set (unordered_flat_set<uint32_t>) by value, heap-allocating on every transaction broadcast. Switch return type to boost::container::small_vector<connection_id_t, 64> which holds up to 64 connection IDs (256 bytes) on the caller's stack frame via RVO, avoiding heap allocation in the common case.

…cast hot paths. Every transaction/vote broadcast previously posted a lambda per peer to its strand, heap-allocating ~64 bytes and copying two shared_ptrs (8 atomic ops per peer). At 25 peers and 1000 TPS this was 25K heap allocs/s + 200K atomic ops/s. add_write_queue is already mutex-protected, and _out_queue.empty() under the same lock atomically tells us whether an async_write completion handler will drain the queue. When a write IS in flight, newly queued messages are picked up automatically — no strand post needed. When the connection is idle, a single coalesced drain post (guarded by _write_drain_pending atomic in queued_buffer) kicks do_queue_write.

Every inbound packed_transaction triggers make_shared + fc::raw::unpack + reflector_init (6-8 heap allocations) before checking the dedup cache. In a gossip network most messages are duplicates, so these allocations are created and immediately freed — ~24K-32K allocator calls/s wasted at 2000 dups/s. Add a fast path in process_next_trx_message that peeks the wire bytes via mb_peek_datastream and computes the transaction ID directly from the packed_trx SHA-256 hash without deserializing. For the common case (k1/r1/em/ed signatures, compression_type::none), this identifies duplicates with zero heap allocations. - trx_dedup.hpp: templated parse_trx_dedup_info() skips signatures by known fixed sizes, reads compression byte, skips context_free_data, then hashes packed_trx bytes via sha256::encoder + 512-byte stack buffer. static_assert guards against new signature types and transaction_header field reordering. - dispatch_manager::have_txn(): read-only local_txns lookup (no insertion, no connection_id tracking, at worst causes one redundant notice to this peer). - connection::try_early_dedup(): peek-parse-check-consume pipeline, falls back to the full unpack path for webauthn/bls signatures, zlib compression, or any parse error.

This reverts commit 5f99839.

…g calls.

Replace packed_transaction with transaction_message{id, trx}. The sender prepends the 32-byte transaction ID before the packed_transaction payload; the receiver peeks those bytes and checks the dedup table before any heap allocation. Duplicate transactions (the common case in gossip) now skip make_shared + unpack + reflector_init entirely, eliminating ~6-8 heap allocations per duplicate (~24K-32K wasted calls/s at 2000 dups/s).

Every inbound vote_message triggers make_shared<vote_message>() + full fc::raw::unpack including two BLS Montgomery field conversions (~1.1 us) before any duplicate check. In a 21-finalizer network with only 10 peers, ~378 duplicate votes/s arrive, each wasting CPU on deserialization that is immediately discarded. Same pattern as the transaction_message optimization: prepend a 32-byte vote_id (SHA-256 of block_id || strong || finalizer_key, excluding the signature which is deterministic) on the wire. The receiver peeks the ID, checks a bounded dedup cache, and only deserializes for new votes. - protocol.hpp: add vote_id_type alias and compute_vote_id() helper - buffer_factory.hpp: add vote_buffer_factory (prepends vote_id on wire) - net_plugin.cpp: add vote_dedup multi_index cache to dispatch_manager with hashed_unique<vote_id> + ordered_non_unique<block_num> indices; restructure process_next_vote_message() to peek-and-check before deserialization; validate wire vote_id matches computed on full path; prune cache entries in expire_blocks() at LIB advancement

…oadcast path.

Copilot

Pull request overview

This PR optimizes net_plugin hot paths by reducing allocations/locking and enabling zero-allocation duplicate detection for transactions and votes via new wire prefixes.

Changes:

Introduces new wire formats for transaction_message and vote_message with fixed ID prefixes to enable early duplicate drops.
Reduces contention/allocations in send/receive paths (atomics for chain info and write queue size, small_vector buffers, reduced posting/callback overhead).
Adds unit tests validating the new wire formats and ID “peek” behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/trx_generator/trx_provider.cpp	Updates trx generator to emit the new `[which][trx_id][packed_transaction]` payload format
plugins/net_plugin/test/net_msg_wire_unittest.cpp	Adds unit tests for trx/vote wire formats and ID peeking
plugins/net_plugin/test/CMakeLists.txt	Registers the new wire-format unit test
plugins/net_plugin/src/net_plugin.cpp	Main performance changes: atomics, new dedup caches, new send/receive behavior for trx/vote, write-queue changes
plugins/net_plugin/include/sysio/net_plugin/protocol.hpp	Adds `transaction_message` variant and `compute_vote_id` helper
plugins/net_plugin/include/sysio/net_plugin/buffer_factory.hpp	Updates tx buffer construction to include wire ID; adds vote buffer factory with vote_id prefix
libraries/chain/controller.cpp	Makes QC integration synchronous before `fork_db_.add()` while keeping voting async

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-18T22:57:50Z

plugins/net_plugin/src/net_plugin.cpp

-         _write_queue_size = 0;
+         _write_queue_size.store(0, std::memory_order_relaxed);
         _trx_write_queue.clear();
         _out_queue.clear();


queued_buffer::reset() clears the queues and _write_queue_size but does not reset _write_drain_pending. If a previous queue_write_mt() set _write_drain_pending=true, subsequent queue_write_mt() calls can fail to post a drain and messages may remain queued until some other strand activity triggers do_queue_write(). Reset should clear _write_drain_pending (same as clear_write_queue()).

Suggested change

_out_queue.clear();

_out_queue.clear();

_write_drain_pending.store(false, std::memory_order_relaxed);

Copilot · 2026-02-18T22:57:51Z

plugins/net_plugin/src/net_plugin.cpp

+      // Early dedup: check if we already have this transaction — zero heap allocations on the duplicate path.
+      // Peek the transaction ID (first field after variant which) for zero-allocation dedup.
+      // Wire format: [which (varint)][transaction_id (32 bytes)][packed_transaction ...]
+      auto peek_ds = pending_message_buffer.create_peek_datastream();
+      unsigned_int which{};
+      fc::raw::unpack( peek_ds, which );
+      transaction_id_type trx_id;
+      fc::raw::unpack( peek_ds, trx_id );
+      if( my_impl->dispatcher.have_txn( trx_id ) ) {
+         peer_dlog( p2p_trx_log, this, "got a duplicate transaction - dropping {}", trx_id );
+         pending_message_buffer.advance_read_ptr( message_length );
+         return true;
+      }


The early-duplicate fast path drops the message without recording that this peer has announced/sent the transaction. That skips dispatch_manager::add_peer_txn()/add_peer_txn_notice(), so connection_ids and per-connection trx_entries_size accounting won't be updated. This can increase redundant rebroadcasts to the same peer and weakens the per-peer tracking/limit behavior. Consider still recording the peer (e.g., via a lightweight notice/add) before advancing the read pointer.

Copilot · 2026-02-18T22:57:51Z

plugins/net_plugin/src/net_plugin.cpp

+      my_impl->dispatcher.add_vote_id( vote_id, block_header::num_from_id(ptr->block_id) );
      handle_message( ptr );


vote_id is inserted into the global dedup cache before the vote has been validated by the chain (signature, finality rules, etc.). A malformed/invalid vote can therefore poison the cache and cause later valid votes with the same (block_id,strong,key) to be dropped, and can also grow the cache with arbitrary (including far-future) block_nums before pruning can occur. Only add to vote_dedup after successful validation/acceptance, and/or enforce bounds on block_num before caching.

Suggested change

my_impl->dispatcher.add_vote_id( vote_id, block_header::num_from_id(ptr->block_id) );

handle_message( ptr );

handle_message( ptr );

my_impl->dispatcher.add_vote_id( vote_id, block_header::num_from_id(ptr->block_id) );

Copilot · 2026-02-18T22:57:51Z

plugins/net_plugin/src/net_plugin.cpp


         const send_buffer_type& sb = buff_factory.get_send_buffer( trx );
         fc_dlog( p2p_trx_log, "sending trx: {}, to connection - {}, size {}", trx->id(), cp->connection_id, sb->size() );
-         boost::asio::post(cp->strand, [cp, sb]() {
-            cp->enqueue_buffer( msg_type_t::packed_transaction, std::nullopt, queued_buffer::queue_t::general, sb, go_away_reason::no_reason );
-         } );
+         cp->queue_write_mt( msg_type_t::transaction_message, queued_buffer::queue_t::general, sb, go_away_reason::no_reason );
      } );


bcast_transaction now unconditionally sends msg_type_t::transaction_message. Since this PR changes the wire format (adds a trx_id prefix and replaces packed_transaction in net_message), this is not backward-compatible with peers speaking the previous protocol. The net plugin typically gates wire/behavior changes on proto_version_t/network_version; consider bumping the protocol version and only using transaction_message with peers that negotiated support (and maintaining a compatibility path during upgrades if needed).

Launching a new chain. No need for backwards compatibility.

Copilot · 2026-02-18T22:57:51Z

plugins/net_plugin/src/net_plugin.cpp

      boost::asio::post( thread_pool.get_executor(), [exclude_peer, msg, this]() mutable {
-            buffer_factory buff_factory;
+            vote_buffer_factory buff_factory;
            const auto& send_buffer = buff_factory.get_send_buffer( *msg );

+            dispatcher.add_vote_id( buff_factory.get_vote_id(), block_header::num_from_id(msg->block_id) );
            dispatcher.bcast_vote_msg( exclude_peer, send_buffer );
      });


send_vote now serializes vote_message as [which][vote_id][vote_message]. This changes the on-the-wire representation of vote_message but is still only gated by proto_version_t::savanna elsewhere. Unless all savanna peers are upgraded simultaneously, this needs a new protocol version (and conditional send/receive) to avoid disconnects or decode failures when talking to older savanna nodes.

Launching a new chain. No need for backwards compatibility. Next PR cleans up the proto versions.

…as sent the transaction, preventing redundant rebroadcasts back to the same peer and keeping per-connection tracking consistent.

The vote_id was being inserted into the dedup cache before chain validation (BLS signature, finality rules). A malformed vote could poison the cache and cause valid votes with the same (block_id, strong, key) tuple to be silently dropped. The post-validation path in bcast_vote_message already calls add_vote_id on success, so the early insert was redundant for valid votes. Removing it ensures only chain-validated votes enter the cache.

brianjohnson5972 · 2026-02-23T19:21:32Z

plugins/net_plugin/src/net_plugin.cpp

+
+      // Validate that the wire ID matches the actual transaction ID.
+      if( ptr->id() != trx_id ) {
+         peer_wlog( p2p_trx_log, this, "transaction_message ID mismatch: wire={} actual={}", trx_id, ptr->id() );


I think "calculated" may be clearer than "actual". Only change if you agree. Same thing goes for the vote dedup code

hmm, I think I like actual better. Calculated makes me thing we figured it out somehow. The actual is how it is actually generated.

heifner added 17 commits February 14, 2026 15:44

Use hashed_unique index

8dd59a3

Since peer_connections() takes a lock, move it out of for_each_connec…

9dfa97f

…tion which also takes a lock.

Convert queued_buffer::_write_queue_size to std::atomic<uint32_t> to …

fa3d870

…eliminate exclusive mutex acquisition in the read-only write_queue_size() method, which is called on every read cycle in start_read_message.

Run integrate_received_qc_to_block synchronously instead of posting to

3670de7

thread pool, saving the received QC immediately on the critical path before fork_db_.add(). Only consider_voting is posted asynchronously. Eliminates thread pool scheduling delay from block processing latency.

Use atomic to avoid a mutex lock

09a6392

Revert "Zero-allocation early dedup for duplicate inbound transactions."

6214ce4

This reverts commit 5f99839.

peer_xlog can only be called from connection strand. Change to fc_xlo…

bade46c

…g calls.

Cache vote_id in vote_buffer_factory to avoid redundant SHA-256 in br…

771a46a

…oadcast path.

Update trx_generator to use new transaction_message structure.

c8c7d6f

Use small_vector to avoid heap allocation

7706fc9

heifner marked this pull request as ready for review February 17, 2026 21:09

brianjohnson5972 requested a review from Copilot February 18, 2026 22:49

Copilot started reviewing on behalf of brianjohnson5972 February 18, 2026 22:50 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

heifner added 3 commits February 19, 2026 07:52

Update _write_drain_pending in reset.

5ac8669

Ensures the early-duplicate fast path still records that this peer h…

1440a21

…as sent the transaction, preventing redundant rebroadcasts back to the same peer and keeping per-connection tracking consistent.

brianjohnson5972 reviewed Feb 23, 2026

View reviewed changes

brianjohnson5972 approved these changes Feb 23, 2026

View reviewed changes

heifner merged commit 8d1531c into master Feb 23, 2026
60 of 66 checks passed

heifner deleted the feature/net-plugin-perf branch February 23, 2026 22:11

	_out_queue.clear();
	_out_queue.clear();
	_write_drain_pending.store(false, std::memory_order_relaxed);

		my_impl->dispatcher.add_vote_id( vote_id, block_header::num_from_id(ptr->block_id) );
		handle_message( ptr );

Conversation

heifner commented Feb 17, 2026

Summary

Changes

Atomic scalars for chain_info reads

Reduce nested lock acquisition in peer_connections()

Atomic _write_queue_size

Synchronous integrate_received_qc_to_block

Atomic for queued_buffer::_out_queue empty check

Eliminate std::function in queued_write

Stack-local small_vector for peer_connections()

Eliminate per-peer boost::asio::post lambda heap allocations

Wire-level transaction ID prefix for zero-allocation duplicate dedup

Wire-level vote ID prefix for zero-allocation duplicate vote dedup

small_vector for write buffer descriptors

Aggregate Performance Estimate

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

heifner Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

heifner Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

heifner Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

heifner Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

heifner Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

brianjohnson5972 Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

heifner Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reduce nested lock acquisition in `peer_connections()`

Atomic `_write_queue_size`

Synchronous `integrate_received_qc_to_block`

Atomic for `queued_buffer::_out_queue` empty check

Eliminate `std::function` in `queued_write`

Stack-local `small_vector` for `peer_connections()`

Eliminate per-peer `boost::asio::post` lambda heap allocations